AITopics | unique word

Collaborating Authors

unique word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2d1b2a5ff364606ff041650887723470-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 07:08:15 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Linguistic Analysis of Sinhala YouTube Comments on Sinhala Music Videos: A Dataset Study

De Mel, W. M. Yomal, de Silva, Nisansa

arXiv.org Artificial IntelligenceJan-28-2025

This research investigates the area of Music Information Retrieval (MIR) and Music Emotion Recognition (MER) in relation to Sinhala songs, an underexplored field in music studies. The purpose of this study is to analyze the behavior of Sinhala comments on YouTube Sinhala song videos using social media comments as primary data sources. These included comments from 27 YouTube videos containing 20 different Sinhala songs, which were carefully selected so that strict linguistic reliability would be maintained and relevancy ensured. This process led to a total of 93,116 comments being gathered upon which the dataset was refined further by advanced filtering methods and transliteration mechanisms resulting into 63,471 Sinhala comments. Additionally, 964 stop-words specific for the Sinhala language were algorithmically derived out of which 182 matched exactly with English stop-words from NLTK corpus once translated. Also, comparisons were made between general domain corpora in Sinhala against the YouTube Comment Corpus in Sinhala confirming latter as good representation of general domain. The meticulously curated data set as well as the derived stop-words form important resources for future research in the fields of MIR and MER, since they could be used and demonstrate that there are possibilities with computational techniques to solve complex musical experiences across varied cultural traditions

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2501.18633

Country:

North America > United States > New York (0.04)
Asia > Sri Lanka (0.04)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.67)

Add feedback

Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges

Van Dinh, Nguyen, Dang, Thanh Chi, Nguyen, Luan Thanh, Van Nguyen, Kiet

arXiv.org Artificial IntelligenceOct-4-2024

Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.

dataset, dialect, experiment, (17 more...)

arXiv.org Artificial Intelligence

2410.03458

Country:

Asia > Vietnam > Hanoi > Hanoi (0.14)
Asia > Vietnam > Thanh Hóa Province > Thanh Hóa (0.04)
Asia > Vietnam > Hưng Yên Province > Hưng Yên (0.04)
(65 more...)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness

Cegin, Jan, Simko, Jakub, Brusilovsky, Peter

arXiv.org Artificial IntelligenceOct-19-2023

The emergence of generative large language models (LLMs) raises the question: what will be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring solutions to a wide variety of human-intelligence tasks, including ones involving text generation, modification or evaluation. For some of these tasks, models like ChatGPT can potentially substitute human workers. In this study, we investigate whether this is the case for the task of paraphrase generation for intent classification. We apply data collection methodology of an existing crowdsourcing study (similar scale, prompts and seed data) using ChatGPT and Falcon-40B. We show that ChatGPT-created paraphrases are more diverse and lead to at least as robust models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.12947

Country:

North America > United States > New York (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
North America > United States > Pennsylvania (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uzbek text's correspondence with the educational potential of pupils: a case study of the School corpus

Madatov, Khabibulla, Matlatipov, Sanatbek, Aripov, Mersaid

arXiv.org Artificial IntelligenceMar-18-2023

One of the major challenges of an educational system is choosing appropriate content considering pupils' age and intellectual potential. In this article the experiment of primary school grades (from 1st to 4th grades) is considered for automatically determining the correspondence of an educational materials recommended for pupils by using the School corpus where it includes the dataset of 25 school textbooks confirmed by the Ministry of preschool and school education of the Republic of Uzbekistan. In this case, TF-IDF scores of the texts are determined, they are converted into a vector representation, and the given educational materials are compared with the corresponding class of the School corpus using the cosine similarity algorithm. Based on the results of the calculation, it is determined whether the given educational material is appropriate or not appropriate for the pupils' educational potential.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

2303.00465

Country:

Asia > Uzbekistan > Toshkent Shahri > Tashkent (0.15)
Europe > Poland > Greater Poland Province > Poznań (0.04)
Asia > Central Asia (0.04)

Genre:

Research Report (0.82)
Instructional Material (0.77)

Industry: Education > Educational Setting > K-12 Education > Primary School (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Uzbek text summarization based on TF-IDF

Madatov, Khabibulla, Bekchanov, Shukurla, Vičič, Jernej

arXiv.org Artificial IntelligenceMar-1-2023

The volume of information is increasing at an incredible rate with the rapid development of the Internet and electronic information services. Due to time constraints, we don't have the opportunity to read all this information. Even the task of analyzing textual data related to one field requires a lot of work. The text summarization task helps to solve these problems. This article presents an experiment on summarization task for Uzbek language, the methodology was based on text abstracting based on TF-IDF algorithm. Using this density function, semantically important parts of the text are extracted. We summarize the given text by applying the n-gram method to important parts of the whole text. The authors used a specially handcrafted corpus called "School corpus" to evaluate the performance of the proposed method. The results show that the proposed approach is effective in extracting summaries from Uzbek language text and can potentially be used in various applications such as information retrieval and natural language processing. Overall, this research contributes to the growing body of work on text summarization in under-resourced languages.

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

2303.00461

Country:

Asia > Uzbekistan (0.05)
Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)

Add feedback

Text Classification Using R, Keras, and Comet ML

#artificialintelligenceFeb-17-2023, 18:56:29 GMT

Text classification is an interesting application of natural language processing. It is a supervised learning methodology that predicts if a piece of text belongs to one category or the other. As a machine learning engineer, you start with a labeled data set that has vast amounts of text that have already been categorized. These algorithms can perform sentiment analysis, create spam filters, and other applications. This tutorial will teach you how to train your binary text classifiers using Keras.

comet ml, dataset, tutorial, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.61)

Add feedback

Step by Step Basics: Text Classifier

#artificialintelligenceFeb-17-2023, 17:15:10 GMT

This is one of the most important steps of any data science project. Ensure that you have fully grasped the question that is being asked. Do you have the relevant data available to answer the question? Does your methodology align with what the stakeholder is expecting? If you need stakeholder buy in, don't go building some super complex model that will be hard to interpret.

dataset, majority class, minority class, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accuracy of the Uzbek stop words detection: a case study on "School corpus"

Madatov, Khabibulla, Bekchanov, Shukurla, Vičič, Jernej

arXiv.org Artificial IntelligenceSep-15-2022

Stop words are very important for information retrieval and text analysis investigation tasks of natural language processing. Current work presents a method to evaluate the quality of a list of stop words aimed at automatically creating techniques. Although the method proposed in this paper was tested on an automatically-generated list of stop words for the Uzbek language, it can be, with some modifications, applied to similar languages either from the same family or the ones that have an agglutinative nature. Since the Uzbek language belongs to the family of agglutinative languages, it can be explained that the automatic detection of stop words in the language is a more complex process than in inflected languages. Moreover, we integrated our previous work on stop words detection in the example of the "School corpus" by investigating how to automatically analyse the detection of stop words in Uzbek texts. This work is devoted to answering whether there is a good way of evaluating available stop words for Uzbek texts, or whether it is possible to determine what part of the Uzbek sentence contains the majority of the stop words by studying the numerical characteristics of the probability of unique words. The results show acceptable accuracy of the stop words lists.

artificial intelligence, natural language, stop word, (16 more...)

arXiv.org Artificial Intelligence

2209.07053

Country:

Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.05)
Asia > Uzbekistan (0.05)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)

Genre: Research Report (0.70)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Add feedback

Fake News Classification with Keras - Analytics Vidhya

#artificialintelligenceJun-27-2022, 17:50:46 GMT

Batch normalization is implemented (if desired) as outlined in the original paper that introduced it, i.e. after the Dense linear transformation but before the non-linear (ReLU) activation. The output layer is just a standard Dense layer with 1 neuron and a sigmoid activation function (that squishes predictions to between 0 and 1), such that our model is ultimately predicting 0 or 1, fake or true. Batch normalization can help speed up training and provides a mild regularizing effect. Both the Keras- and spaCy-embedded models will take a good amount of time to train, but ultimately we'll end up with something that we can evaluate on our test data with. Overall, the Keras-embedded model performed better– achieving a test accuracy of 99.1% vs the spaCy model's 94.8%.

analytic vidhya, dataset, sequence, (12 more...)

#artificialintelligence

Industry: Media > News (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback